Fast Isomorphism Testing of Graphs with Regularly-Connected 

Components 



The Graph Isomorphism problem has both theoretical and practical interest. In this pa- 
per we present an algorithm, called conauto-1.2, that efficiently tests whether two graphs are 
isomorphic, and finds an isomorphism if they are. This algorithm is an improved version of 
the algorithm conauto, which has been shown to be very fast for random graphs and several 
families of hard graphs [9]. In this paper we establish a new theorem that allows, at very low 
cost, the easy discovery of many automorphisms. This result is especially suited for graphs with 
regularly connected components, and can be applied in any isomorphism testing and canonical 
labeling algorithm to drastically improve its performance. In particular, algorithm conauto-1.2 
is obtained by the application of this result to conauto. The resulting algorithm preserves all the 
nice features of conauto, but drastically improves the testing of graphs with regularly connected 
components. We run extensive experiments, which show that the most popular algorithms 
(namely, nauty [TD1 |TT] and bliss [5]) can not compete with conauto-1.2 for these graph families. 
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Abstract 



1 Introduction 



The Graph Isomorphism problem (GI) is of both theoretical and practical interest. GI tests whether 
there is a one-to-one mapping between the vertices of two graphs that preserves the arcs. This 
problem has applications in many fields, like pattern recognition and computer vision [3], data 
mining [T7j, VLSI layout validation [I], and chemistry [5| 115]. At the theoretical level, its main 
theoretical interest is that it is not known whether GI is in P or whether it is NP-complete. 

Related Work It would be nice to find a complete graph-invarian10 computable in polynomial 
time, what would allow testing graphs for isomorphism in polynomial time. However, no such 
invariant is known, and it is unlikely to exist. Note, though, that there are many simple instances 
of GI, and that many families of graphs can be tested for isomorphism in polynomial time: trees 
[2], planar graphs [7j, graphs of bounded degree [6j, etc. For a review of the theoretical results 
related to GI see p[T3]. 

The most interesting practical approaches to the GI problem are (1) the direct approach, which 
uses backtracking to find a match between the graphs, using techniques to prune the search tree, 
and (2) computing a certificate of each of the graphs to test, and then compare the certificates 
directly. The direct approach can be used for both graph and subgraph isomorphism (e.g. vf2 [J| 
and Ullman's [16] algorithms), but has problems when dealing with highly regular graphs with a 
relatively small automorphism group. In this case, even the use of heuristics to prune the search 
space frequently does not prevent the proposed algorithms from exploring paths equivalent to those 
already tested. To avoid this, it is necessary to keep track of discovered automorphisms, and use 
this information to aggressively prune the search space. On the other hand, using certificates, 
since two isomorphic graphs have the same canonical labeling, their certificates can be compared 
directly. This is the approach used by the well-known algorithm nauty [10\ [TT] , and the algorithm 
bliss [8] (which has better performance than nauty for some graph families). This approach requires 
computing the full automorphism group of the graph (at least a set of generators). In most cases, 
these algorithms are faster than the ones that use the direct approach. 

Algorithm conauto [9] uses a new approach to graph isomorphism!!. It combines the use of 
discovered automorphisms with a backtracking algorithm that tries to find a match of the graphs 
without the need of generating a canonical form. To test graphs of n nodes conauto uses 0{n 2 log n) 
bits of memory. Additionally, it runs in polynomial time (on n) with high probability for random 
graphs. In real experiments, for several families of interesting hard graphs, conauto is faster than 
nauty and vf2, as shown in [9]. For example Miyazaki's graphs [12], are very hard for vf2, nauty, 
and bliss, but conauto handles them efficiently. However, it was found in [9] that some families 
of graphs built from regularly connected components (in particular, from strongly regular graphs) 
are not handled efficiently by any of the algorithms evaluated. While conauto runs fast when the 
tested graphs are isomorphic, it is very slow when the graphs are not isomorphic. 

Contributions In this paper we establish a new theorem that allows, at very low cost, the 
easy discovery of many automorphisms. This result is especially suited for graphs with regularly 

A complete graph-invariant is a function on a graph that gives the same result for isomorphic graphs, and different 
results for non-isomorphic graphs. 

2 A certificate of a graph is a canonical labeling of the graph. 

3 A preliminary version of conauto has been included in the LEDA CH — h class library of algorithms [14] , 
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connected components, and can be applied in any direct isomorphism testing or canonical labeling 
algorithm to drastically improve its performance. 

Then, a new algorithm, called conauto-1.2, is proposed. This algorithm is obtained by improving 
conauto with techniques derived from the above mentioned theorem. In particular, conauto-1.2 
reduces the backtracking needed to explore every plausible path in the search space with respect to 
conauto. The resulting algorithm preserves all the nice features of conauto, but drastically improves 
the testing of some graphs, like those with regularly connected components. 

We have carried out experiments to compare the practical performance of conauto-1.2, nauty, 
and bliss, with different families of graphs built by regularly connecting copies of small components. 
The experiments show that, for this type of construction, conauto-1.2 not only is the fastest, but 
also has a very regular behavior. 



Structure In Section [21 we define the basic theoretical concepts used in algorithm conauto- 
1.2 and present the theorems on which its correction relies. Next, in Section [3] we describe the 
algorithm itself. Then, Section H] describes the graph families used for the tests, and show the 
practical performance of conauto-1.2 compared with conauto, nauty and bliss for these families. 
Finally we put forward our conclusions and propose new ways to improve conauto-1.2. 



2 Theoretical Foundation 

2.1 Basic Definitions 

A directed graph G = (V, R) consists of a finite non-empty set V of vertices and a binary relation 
R, i.e. a subset R C V x V. The elements of R are called arcs. An arc (u, v) 6 R is considered 
to be oriented from u to v. An undirected graph is a graph whose arc set R is symmetrical, i.e. 
(u, v ) E R iff (v, u) E R. From now on, we will use the term graph to refer to a directed graph. 

Definition 1 An isomorphism of graphs G = (Vg,Rg) an d H = (Vh,Rh) is a bijection between 
the vertex sets of G and H, f :Va — > Vh, such that (v,u) € Rq (f(v),f(u)) E Rh- Graphs 

G and H are called isomorphic, written G ~ H , if there is at least one isomorphism of them. An 
automorphism of G is an isomorphism of G and itself. 



Given a graph G = (V, R), R can be represented by an adjacency matrix Adj(G) 
\V\ x \V\ in the following way: 



A with size 



A,, 



if (u,v) <£ R A (v,u) i R 
2 if (u, v) E R A (v, u)<£R 



1 if (u,v) £ RA(v,u) E R 
3 if (u, v) E R A (v, u) E R 



Let G = (V, R) be a graph, and Adj(G) = A its adjacency matrix. Let V\ C V and v E V, the 
available degree of v in V% under G, denoted by ADeg(v,Vi,G), is the degree of v with respect to 
V\, i.e., the 3-tuple (D3, D2, D\) where Di = \{u E V\ : A vu = i}\ for i E {1,2,3}. The predicate 
HasLinks(v,Vi,G) says if v has any neighbor in V\, i.e., ADeg(v,V\,G) ^ (0,0,0). Extending 
the notation, let V\,V2 C V; ifVu,v E Vi,ADeg(u,Vz,G) = ADeg(v,V2,G) = d, then, we denote 
ADeg(V\,V2,G) = d. HasLinks(Vi,V2,G) is defined similarly. 

We will say a 3-tuple (-D3, D2,D\) -< (E3, E2,E{) when the first one precedes the second one in 
lexicographic order. This notation will be used to order the available degrees of vertices and sets. 
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2.2 Specific Notation and Definitions for the Algorithms 

It will be necessary to introduce some specific notation to be used in the specification of our 
algorithms. Like other isomorphism testing algorithms, ours relies on vertex classification. Let us 
start defining what a partition is, and the partition concatenation operation. 

A partition of a set 5 is a sequence S = (Si, S r ) of disjoint nonempty subsets of S such that 
S = Ui=i The sets Si are called the cells of S. The empty partition will be denoted by 0. 

Definition 2 Let S = (Si, S r ) and T = (Ti, T s ) be partitions of two disjoint sets S and T, 
respectively. The concatenation of S and T , denoted 5 o T, is the partition (Si, S r , T\, T s ). 
Clearly, 0o5 = 5 = 5o0. 

Let G = (V,R) be a graph, v G V, Vi C V \ {v}. The vertex partition of V\ by v, denoted 
PartitionBy Vertex (Vi,v,G), is a partition (S\,...,S r ) of Vi such that for all i,j G {l,...,r}, i > j 
implies ADeg(Si, {v}, G) < ADeg(S j ,{v},G). Let Vi,V 2 C V. The se£ partition of Vi by V2, 
denoted PartitionBySet(Vi,V2,G), is a partition («Si, ...,S r ) of Vi such that for all i,j G {1, ■ ■■,r}, 
i > j implies ADe 5 (S;,F 2 ,G) -< ADeg(Sj,V 2 ,G). 

Definition 3 Let G = (V, R) be a graph, and S = (Si, S r ) a partition ofV . Let v G S x for some 
x G {l,...,r}. The vertex refinement of S by v, denoted VertexRefinement(S,v,G) is the partition 
T = 7i o ... o % such that for all i G {1, r}, % is the empty partition if -> Has Links (Si, V, G), 
and PartitionBy Vertex (Si \ {v}, v, G) otherwise. S x is the pivot set and v is the pivot vertex. 

Definition 4 Let G = (V, R) be a graph, and S = (Si, S r ) a partition ofV. Let P = S x for some 
x G {l,...,r} be a given pivot set. The set refinement of S by P, denoted SetRefinement(S,P,G) 
is the partition T = 71 o ... o %. such that for all i G {l,...,r}, % is the empty partition if 
^HasLinks(Si,V,G), and PartitionBySet(Si,P,G) otherwise. 

Once we have presented the possible partition refinements that may be applied to partitions, 
we can build sequences of partitions in which an initial partition (for example the one with one 
cell containing all the vertices of a graph) is iteratively refined using the two previously defined 
refinements. Vertex refinements are tagged as VERTEX (if the pivot set has only one vertex), SET 
(if a set refinement is possible with some pivot set), or BACKTRACK (when a vertex refinement 
is performed with a pivot set with more than one vertex). 

Definition 5 Let G = (V, R) be a graph. A sequence of partitions for graph G is a tuple (S, R, P), 
where S = (S° , ...,S t ), are the partitions themselves, R = (R° , R 1 ^ 1 ) indicate the type of refine- 
ment applied at each step, and P = (P°, P t ~ 1 ) choose the pivot set used for each refinement step, 
such that all the following statements hold: 

1. For all i G {0, ...,t - 1}, R l G {VERTEX, SET, BACKTRACK}, and P i G {1, 

2. For all i G {1, ...,t - 1}, let S i = (S\, S l r .), V i = U£=i S j- Then: 

(a) R l = SET implies S i+l = SetRefinement(S\ S^^G). 

(b) R 1 ^ SET implies S t+1 = VertexRefinement(S % ,v, G) for some v G S l pi . 

3. Let 5* = (S{, S f r ), V 1 = \J j=l Sj, then for all Si G S 1 , |S* | = 1 or ^HasLinks(S t x , V t , G). 



4 



For convenience, for alH E {1, t — 1}, by level I we refer to the tuple (S l ,R l ,P l ) in a sequence 
of partitions. Level t is identified by 5', since R t and P t are not defined. 

We will now introduce the concept of compatibility among partitions, and then define compat- 
ibility of sequences of partitions. Let S = (Si, ...,S r ) be a partition of the set of vertices of a graph 
G = (Vq, Rg)i and let T = (Ti, T s ) be a partition of the set of vertices of a graph H = (Vh, Rh)- 
S and T are said to be compatible under G and H respectively if |<S| = |T| (i.e. r = s), and for all 
i£{l,...,r}, \Si\ = \T t \ <md ADeg(Si,V G ,G) = ADeg(T u V H ,H). 

Definition 6 Let G = (Vg,Rg) and H = (Vh,Rh) be two graphs. Let Qg = (Sg,P-g,Pg), and 
Qh = (Sh, Rff, Ph) be two sequences of partitions for graphs G and H respectively. Qg and Qh 
are said to be compatible sequences of partitions if: 

1. \S G \ = \S H \ = t, \R G \ = \R H \ = \Pg\ = \Ph\ =t-l. 

2. Let R G = (i^,..,^ 1 ), = (i?^,...,^ 1 ), P G = (P G , P^ 1 ), P H = (J* , Pjf 1 ), 
S G = (5°,..., 5*), S H = (7^,...,r*). For all i e {0,...,t- 1}, R G = R H , P G = P l H , and S l 
and T l are compatible under G and H respectively. 

3. LetS 1 = (Si,..., S*), V = (T*, ...,T*), then for all x,y E {l,...,r}, ADeg(S t x ,S t y ,G) = 
ADeg(T*,Ty, H). 

The following theorem shows that having compatible sequences of partitions is equivalent to 
being isomorphic. 

Theorem 1 ([9]) Two graphs G and H are isomorphic if and only if there are two compatible 
sequences of partitions Qg and Qh for graphs G and H respectively. 

In order to properly handle automorphisms, sequences of partitions will be extended with vertex 
equivalence information. Two vertices u, v G V of a graph G = (V, R) are equivalent, denoted u = v, 
if there is an automorphism / of G such that f(u) = v. A vertex w € V is fixed by / if f(w) = w. 
When two vertices are equivalent, they are said to belong to the same orbit. The set of all the 
orbits of a graph is called the orbit partition. Our algorithm performs a partial computation of 
the orbit partition. The orbit partition will be computed incrementally, starting from the singleton 
partition. Since our algorithm performs a limited search for automorphisms, it is possible that it 
stops before the orbit partition is really found. Therefore, we will introduce the notion of semiorbit 
partition, and extend the sequence of partitions to include a semiorbit partition. 

Definition 7 Let G = (V, R) be a graph. A semiorbit partition of G is any partition = 
{Oi, Ok} of V, such that Vi G {1, k}, v,u G L implies that v = u. 

Definition 8 An extended sequence of partitions E for a graph G = (V, R) is a tuple (Q, 0), where 
Q is a sequence of partitions, denoted as SeqPart(E), and is a semiorbit partition of G, denoted 
as Orbits (E). 

Finally, we introduce a notation for the number of vertex refinements tagged BACKTRACK, 
since it will be used to choose the target sequence of partitions to be reproduced. Let Q = (S, R, P) 
be a sequence of partitions, and let R = (Br, -R' -1 ). Then, Backtrack Amount (Q) = \{i : i G 
{1, ...,t- 1} A R i = BACKTRACK}]. 
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2.3 Components Theorem 

It was observed [9] that conauto is very efficient finding isomorphisms for unions of strongly regular 
graphs, but it is inefficient detecting that two such unions are not isomorphic. Exploring the 
behavior of conauto in graphs that are the disjoint union of connected components, we observed 
that it was not able to identify cases in which components in both graphs had already been matched. 
This was leading to many redundant attempts of matching components. 

Note that, once a component Cq of a graph G has been found isomorphic to a component Ch 
of a graph H, it is of no use trying to match Cq to another component of H . Besides, if Cq can not 
be matched to any component of H, it is of no use trying to match the other components, since, at 
the end, the graphs can not be isomorphic. After a thorough study of the behavior of conauto for 
these graphs, we have concluded that its performance can be drastically improved in these cases 
by directly applying the following theorem (whose proof can be found in the Appendix): 

Theorem 2 During the search for a sequence of partitions compatible with the target, backtracking 
from a level I to a level k < I, such that each cell of level I is contained in a different cell of level 
k, can not provide a compatible partition. 



In this section we propose a new algorithm conauto-1.2 (described in Algorithm [T]) which is based 
on algorithm conauto [9], and uses the result of Theorem [2] to drastically reduce backtracking. 
It starts generating a sequence of partitions for each of the graphs being tested (using function 
Generates equenceO] 'Partitions), and performing a limited search for automorphisms using function 
F ind Automorphisms , just like conauto. The difference with conauto is that, during the search for 
the compatible sequence of partitions (Match), the algorithm not always backtracks to the previous 
recursive call (the previous level in the sequence of partitions). Instead, it may backtrack directly 
to a much higher level, or even stop the search, concluding that the graphs are not isomorphic, 
skipping intermediate backtracking points. 

Function GenerateSequenceOf Partitions is the same used by conauto (see [9] for the details). 
It is worth mentioning that it generates a sequence of partitions with the following criteria: 

1. It starts with the degree partition, and ends when it gets a partition in which no non-singleton 
cell has remaining links. 

2. The pivot cell used for a refinement must always have remaining links (the more, the better). 

3. At each level, a vertex refinement with a singleton pivot cell is the preferred choice. 

4. The second best choice is to perform a set refinement, preferring small cells over big ones. 

5. If the previous refinements can not be used, then a vertex is chosen from the pivot cell 
(the smallest cell with links), a vertex refinement is performed with that pivot vertex, and a 
backtracking point arises. 



Function FindAutomorphisms is also the same used by conauto (see [9] for the details). It takes 
as input a sequence of partitions for a graph, and generates an extended sequence of partitions. In 
the process, it tries to eliminate backtracking points, and builds a semiorbit partition of the vertices 
with the information on vertex equivalences it gathers. Recall that two vertices are equivalent if 
there is an automorphism that permutes them, i.e., if there are two equivalent sequences of partitions 
in which one takes the place of the other. 



3 Conauto-1.2 
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Algorithm 1 Test whether G and H are isomorphic {conauto- 1 . 2) . 

AreIsomorphic(G , H) : boolean 

1 Qc <— Generates 'equenceOfPartitions(G) 

2 Qh GenerateSequenceOf Partitions (H) 

3 Eg <— FindAutomorphisms(G,QG) 

4 E^f «— FindAutomorphisms(H ,Qh) 

5 if BacktrackAmount(SeqPart(Ea)) < BacktrackAmount(SeqPart(EH)) then 

6 return < Match(0, G, H, SeqPart(E G ), Orbits(E H )) 

7 else 

8 return < Match(0, H, G, SeqPart(E H ), Orbits(E G )) 

9 end if 



Algorithm 2 Find a sequence of partitions compatible with the target. 

Match(l, G, H, Q G , H ) : integer 

1 if partition labeled FIN and the adjacencies in both partitions match 

2 return I 

3 else if partition labeled VERTEX and vertex refinement compatible then 

4 l'*—Match(l + 1,G,H,Q G , Oh) 

5 if I ^ I' then return I' 

6 else if partition labeled SET and set refinement compatible then 

7 I' < — Match{l + 1,G,H,Q G , Ob) 

8 if / ^ I' then return I' 

9 else if partition labeled BACKTRACK then 

10 for each vertex v in the pivot cell, while NOT success do 

11 if v may NOT be discarded according to Oh and vertex refinement compatible then 

12 I' i — Match(l + l,G,H,Q a ,0 H ) 

13 if I 7^ I' then return I' 

14 end if 

15 end for 

16 end if 

17 return the nearest level I' such that the condition of Theorem [2] holds 



Function Match (Algorithm [2]) uses backtracking attempting to find a sequence of partitions 
for graph H that is compatible with the one for graph G. At backtracking points, it tries every 
feasible vertex in the pivot cell, so that no possible solution is missed. 

Note that, unlike in conauto, the function Match of conauto-1.2 does not return a boolean, but 
an integer. Thus, if Match returns —1, that means that a mismatch has been found at some level 
I, such that there is no previous level /' at which a cell contains (at least) two cells of the partition 
of level /. Hence, from Theorem [2] there is no other feasible alternative in the search space that can 
yield an isomorphism of the graphs. If it returns a value that is higher than the current level, then 
a match has been found, the graphs are isomorphic and there is no need to continue the search. 
Therefore, in this case the call immediately returns with this value. If it returns a value that is lower 
than the current level, then it is necessary to backtrack to that level, since trying another option 
at this level is meaningless according to Theorem [2j Hence the algorithm also returns immediately 
with that value. If a call at level / returns /, then another alternative at this level I should be tried 
if possible. In any other case, it applies Theorem [2] directly, and returns the closest (previous) level 
/' at which two cells of the current level I belong to the same cell of V . If no such previous level 
exists, it returns —1. 
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4 Performance Evaluation 



In this section we compare the practical performance of conauto-1.2 with nauty and bliss, two well- 
known algorithms that are considered the fastest algorithms for isomorphism testing and canonical 
labeling. In the performance evaluation experiments, we have run these programs with instances 
(pairs of graphs) that belong to specific families. We also use conauto to show the improvement 
achieved by conauto-1.2 for these graph families. Undirected and directed (when possible) graphs 
of different sizes (number of nodes) have been considered. The experiments include instances of 
isomorphic and non-isomorphic pairs of graphs. 

4.1 Graph Families 

For the evaluation, we have built some families of graphs with regularly-connected components. 
The general construction technique of these graphs consists of combining small components of 
different types by either (1) connecting every vertex of each component to all the vertices of the 
other components, (2) connecting only some vertices in each component to some vertices in all the 
other components, or (3) applying the latter construction in two levels. The use of these techniques 
guarantees that the resulting graph is connected, which is convenient to evaluate algorithms that 
require connectivity (like, e.g., vf2 [4J). Using the disjoint union of connected components yields 
similar experimental results. 

Next, we describe each family of graphs used. In fact, as the reader will easily infer, the key 
point in all these constructions is that the components are either disconnected, or connected via 
complete n-partite graphs. Hence, multiple other constructions may be used which would yield 
similar results. In each graph family, one hundred pairs of isomorphic and non-isomorphic graphs 
have been generated for each graph size (up to approximately 1,000 vertices). 

Unions of Strongly Regular Graphs This graph family is built from a set of 20 strongly 
regular graphs with parameters (29, 14, 6, 7) as components. The components are interconnected 
so that each vertex in one component is connected to every vertex in the other components. This 
is equivalent to inverting the components, then applying the disjoint union, and finally inverting 
the result. Graphs up to 20 x 29 = 580 vertices have only one copy of each component, and bigger 
ones may have more than one copy of each component. 

Unions of Tripartite Graphs For this family, we use the digraphs in Figure Q] as the basic 
components. For the positive tests (isomorphic graphs) we use the same number of components of 
each type, while for the negative tests we use one graph with the same number of components of 
each type, and another graph in which one component has been replaced by one of the other type. 

The connections between components have been done in the following way. The vertices in 
the A subset of each component are connected to all the vertices in the B subsets of the other 
components. See Figure [1] to locate these subsets. The arcs are directed from the vertices in the A 
subsets, to the vertices in the B subsets. From the previously described graphs, we have obtained 
an undirected version by transforming every (directed) arc into an (undirected) edge. 

Hypo-Hamiltonian Graphs 2-level-connected For this family we use two non-isomorphic 
Hypo-Hamiltonian graphs with 22 vertices. Both graphs have four orbits of sizes: one, three, six, 
and twelve. These basic components are interconnected at two levels. Let us call the vertices in 
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Figure 1: Tripartite graphs used as components. 



the orbits of size one, the 1-orbit vertices, and the vertices in the orbits of size three the 3-orbit 
vertices. In the first level, we connect n basic components, to form a first-level component, by 
connecting all the 3-orbit vertices in each basic component to all the 3-orbit vertices of the other 
basic components. In this construction, the 3-orbit vertices, along with the new edges added to 
interconnect the n basic components, form a complete n-partite graph. Then, in the second level, 
m first-level components are interconnected by adding edges that connect the 1-orbit vertices of 
each first-level component with all the 1-orbit vertices of the other first-level components. Again, 
the 1-orbit vertices, along with the edges connecting them, form a complete m-partite graph. Since 
we use two Hypo-Hamiltonian graphs as basic components, to generate negative isomorphism cases, 
a component of one type is replaced with one of the other type. 

4.2 Evaluation Results 

The performance of the four programs has been evaluated in terms of their execution time with 
multiple instances of graphs from the previously defined families. The execution times have been 
measured in a Pentium III at 1.0 GHz with 256 MB of main memory, under Linux RedHat 9.0. 
The same compiler (GNU gcc) and the same optimization flag (-0) have been used to compile all 
the programs. The time measured is the real execution time (not only CPU time) of the programs. 
This time does not include the time to load the graphs from disk into memory. A time limit of 
10, 000 seconds has been set for each execution. When the execution of a program with graphs of 
size s reaches this limit, all the execution data of that program for graphs of the same family with 
size no smaller than s are discarded. 

Average Execution Time The results of the experiments are first presented, in Figure [21 as 
curves that represent execution time as a function of graph size. In these curves, each point is the 
average execution time of the corresponding program on all the instances of the corresponding size. 

It was previously known that nauty requires exponential time to process graphs that are unions 
of strongly regular graphs [12]. From our results, we conjecture that bliss has the same problem. 
That does not apply to conauto-1.2, though. While the original conauto had problems with non- 
isomorphic pairs of graphs, conauto-1.2 overcomes this problem. 

With the family of unions of tripartite graphs, we have run both positive and negative experi- 
ments with directed and undirected versions of the graphs. In all cases, conauto-1.2 has a very low 
execution time. (Again, the improvement of conauto-1.2 over conauto is apparent in the case of 
negative tests.) Observe that there are no significant differences in the execution times of bliss and 
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Figure 2: Average execution time. 



conauto- 1.2 between the directed and the undirected cases. However, nauty is slower with directed 
graphs, even using the adjacencies invariant specifically designed for directed graphs. 

Our last graph family, Cubic Hypohamiltonian 2-level-connected graphs, has a more complex 
structure than the other families, having two levels of interconnection. However, the results do not 
differ significantly from the previous ones. It seems that these graphs are a bit easier to process 
(compared with the other graph families) for bliss, but not for nauty. Like in the previous cases, 
conauto- 1.2 is fast and consistent with the graphs in this family. It clearly improves the results of 
conauto for the non-isomorphic pairs of graphs. 
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Figure 3: Normalized Standard Deviation of execution times. 



Standard Deviation In addition to the average behavior for each graph size, we have also 
evaluated the regular behavior of the programs. With regular behavior we mean that the time 
required to process any pair of graphs of the same family and size is very similar. We have 
observed that conauto- 1.2 is not only fast for all these families of graphs, but it also has a very 
regular behavior. However, that does not hold for nauty nor bliss. This is illustrated with the plots 
of the normalized standard deviatior^ (NSD) shown in Figure [3j Algorithm conauto-1.2 has a NSD 
that remains almost constant, and very close to cero, for all graph sizes, and even decreases for 
larger graphs. However, nauty and bliss have a much more erratic behavior. In the case of conauto, 
we see that its problems arise when it faces negative tests, where the NSD rapidly grows. 



5 Conclusions and Future Work 

We have presented a result (the Components Theorem, Theorem [2]) that can be applied in GI 
algorithms to efficiently find automorphisms. Then, we have applied this result to transform the 
algorithm conauto into conauto-1.2. Algorithm conauto-1.2 has been shown to be fast and consis- 
tent in performance for a variety of graph families. However, the algorithm conauto-1.2 can still be 
improved in several ways: (1) by adding the capability of computing a complete set of generators 
for the automorphism group, (2) by making extensive use of discovered automorphisms during the 
match process, and (3) by computing canonical forms of graphs. In all these possible improvements, 
the Components Theorem will surely help. Additionally, the Components Theorem might also be 
used by nauty and bliss to improve their performance for the graph families considered, at low cost. 

4 The normalized standard deviation is obtained by dividing the standard deviation of the sample by the mean. 
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A Proof of the Components Theorem (Theorem [2]) 

The following definition will be needed in the proof. 

Definition 9 Let G = (V,R) be a graph. Let V C V. Then the subgraph induced by V on G, 
denoted Gy, is the graph H = (V',R') such that R' = {(u,v) : u, v G V A (u, v) G R}. 

A backtracking point arises when a partition does not have singleton cells (suitable for a vertex 
refinement) and it is not possible to refine such partition by means of a set refinement. Let us 
introduce a new concept that will be useful in the following discussion. 

Definition 10 Let G = (V,R) be a graph, and let S = (Si, ...,S r ) be a partition of V . S is said 
to be equitable (with respect to G) if for all i G {l,...,r}, for all u,v G Si, for all j E {l,...,r} ; 
ADeg(u,Sj,G) = ADeg(v, Sj,G). 

Observation 1 The partition at a backtracking point is equitable. 

Proof: Assume otherwise. Then, there exists some Sj such that there are two vertices u, v in some 
Si, such that ADeg(u, Sj,G) ^ ADeg(v, Sj,G). Therefore, it would be possible to perform a set 
refinement on the partition, using Sj as the pivot cell, and vertices u and v would be distinguished 
by this refinement, and cell Si would be split. This is not possible since, at a backtracking point, 
no set refinement has succeeded. ■ 

Observation 2 Let I be a backtracking level. Let S l = (S[, S l r ) be the partition at that level. 
Then, for all i G {1, ...,r}, G s i is regular. 

Proof: From Observation Q3 S l is equitable. Fix i G {l,...,r}, then, from Definition 1 101 for all 
u, v G Si ADeg(u, SL G) = ADeg(v, S\, G). Therefore, G s i is regular, for all i G {1, r}. ■ 

i 

Let Q = (S, R, P) be a sequence of partitions for graph G = (V,R) where S = (5°, R = 

(i? ,...,^'- 1 ), and P = (P ,...,^- 1 ). For all i G {0,...,t} let S { = (S\, S l r J, and V i = UjLi^- 
We consider two backtracking levels k and I that satisfy the preconditions of Theorem [21 i.e., k < I 
and each cell of S l is contained in a different cell of S k . 

Let p G Sp k be the pivot vertex used for the vertex refinement at level k. Assume there is 
a vertex q G Sp k ,q ^ p that satisfies the following. T k+1 = VertexRefinement(S k ,q,G V k) is a 
partition that is compatible with S k+1 . Let T k+1 = (T k+1 , T k +\), W k+1 = [}] k ^ T k+1 . For 
all i G {k + 2, ...,/}, let T = (T[,...,T}.J be compatible with S\ where W* = UjLi^, T l = 
SetRefinementiT- 1 ,Tp} 1 ,G W i-i) if R i ~ 1 = SET, and T = VertexRefinement^- 1 ,v,G W i-i) 
for some v G Tp7-i if R l_1 ^ SET. This generates an alternative sequence of partitions that is 
compatible with the original one up to level I. 

Under these premises, we show in the rest of the section that Gyi and G w i are isomorphic, 
and there is an isomorphism of them that matches the vertices in S\ to the vertices in T- for all 

i G {i,...,n}. 

To simplify the notation, let us assume r^ = n = r. Note that in this case, for all i G {1, •.•,?'}, 
Sl C S k . In case ^ 77 this correspondence is not trivial. However, we can safely assume that 
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there may be some S\ £ <S' that are empty, and develop our argument considering this possibility, 
although we know that in the real sequence of partitions, these empty cells would have been 
discarded. 

For all i £ {1, r}, let Ei = \ S\, E\ = Sf \ T\ be the vertices discarded in the refinements 
from Sf to S\ and T\ respectively, let Ai = Ei n E^ be the vertices discarded in both alternative 
refinements, Bi = Ei\ Ai the vertices discarded only in the refinement from Sf to S\, Ci = E[\ A; L 
the vertices discarded only in the refinement from S\ to T\ , and D = S\ n T\ the vertices remaining 
in both alternative partitions at level I. Let A = |J[ =1 Ai, B = U[=i Bi, C = U[=i C«i ^ = U[=i A, 
£ = (J- = i and £" = (J[=i Clearly, i? = i U B, and E' = AU C. Observe that \E t \ = \E'A, 
and hence \Bi\ = \Ci\ for all i £ {1, r}. 



Ei 





n 






T i 


A! 


B l 


. . . E r 




B r 


Ci 




S r 




D r 



Figure 4: Partition of Sf into subsets Ai, Bi, Ci, and D- L for all i E {1, r}. 

Observation 3 Ge is isomorphic to Ge> , and there is an isomorphism of them that matches the 
vertices in Ei to those in E[, for all i G {1, ...,r}. 

Proof: Direct from the construction of the sequences of partitions. ■ 



Lemma 1 Let M = Adj(G). It is satisfied that: 

• For each u G E, for all i £ {1, ...,r}, for all v,w £ S\, M u 

l 



M um and M v 



M„ 



• For each u € E' , for all i £ {1, r}, for all v, w € T-, M uv = M uw and M vu = M wu . 

Proof: Since none of the vertices in E has been able to distinguish among the vertices in cell S*', 
each of the discarded vertices has the same type of adjacency with all the vertices in S\. Otherwise, 
consider vertex u £ E. Assume u has at least two different types of adjacency with the vertices 
in S\. Since it was discarded during the refinements from Sf to S\, that had to be for one of the 
following reasons: 

1. It was discarded for having no links (i.e. links of type 0), what is impossible since it has two 
different types of adjacencies with the vertices in S\. 

2. It was used as the pivot set in a vertex refinement, what is impossible since it would have 
been able to split cell S\. 

The same argument applies to the vertices in E' with respect to the vertices in each cell T\. ■ 



Consider the adjacency between vertex u and vertex v is M uv = a for some a £ {0, 3}. Then, 
we will denote the adjacency between v and u (M vu ) as a -1 . Note that if a = 0, a -1 = 0, if a = 1, 



= 2, if a = 2, a 1 = 1, and if a = 3, a 1 = 3. 



Lemma 2 For each i,j £ {l,...,r}, there is some a £ {0, ...,3} such that for all u £ Bi, v £ Ci, 

w £ Di, v! £ Bj, v' £ Cj, and w' £ Dj, M u 



M u / V = M u i w 



M„ 



M„ 



= a~\ 



M„ 



M„ 



M wv i = a and 
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Proof: Let us take any i E {l,...,r} and any j E {l,...,r}. Since B t C E and Cj C Sj, from 
Lemma [H for each u £ Bi, for all t>' E Cj, M uv i = a for some a E {0, 3}. Let us take any such 
v' E Cj. Then, M v i u = a -1 for those particular v' and u. Besides, since Cj C £" and ^ C T/, from 
Lemma[TJ for all u E 5j, M„/ u = 6 for some 6 E {0, 3}. S ince we already know that Af v * u — ci 
for that particular pair of vertices, then we conclude that for all u E Bi, v' E Cj, M uv > = a and 
M v i u = a -1 , for some a E {0, ...,3}. 

Sj = Cj U Dj and Bi C £\ Since for all u £ Bi, v' £ Cj, M uv i = a and M„/ u = a -1 , then from 
Lemma [H for all u E Bi, w' E Dj, M uw i = a (clearly, the same a) and M w / U = a -1 . 

T\ = Bi U Di and Cj C Since for all u E Bi, v' E Cj, M w > = a and M„/ u = a -1 , then from 
Lemma [H for all v' E Cj, u> E -Dj, M„/ ffi = a -1 and M OT < = a (clearly, the same a). 

Furthermore, all the vertices in Sj = Cj UDj have the same number of adjacent vertices of each 
type in Ei = Ai U Bi. Otherwise, they would have been distinguished in the refinement process 
from S k to S l . Likewise, all the vertices in Tj = Bj UDj have the same number of adjacent vertices 
of each type in E'- = Ai U Cj. Otherwise, they would have been distinguished in the refinement 
process from S k to T ■ Hence, the vertices of Dj must have the same number of adjacent vertices of 
each type in B, L and Cj. Hence, since for all w' E Dj, and for all u E Bi, M uw > = a and M w / U = a -1 , 
then for all w' E Dj, and for all v E Cj, M vw > = a and M w / V = a -1 too. 

A similar argument may be used to prove that for all w E Di, and for all v! E Bj, M wu i = a and 
M u ' w = a -1 . Then, from Lemma HJ since Bj C E, for all u' E Bj, M u i x = M u i y for all x,y E S\. 
We already know that for all u' E Bj, M u i w = a -1 for all w E Di, and 5"- = Cj U Di. Hence, for all 
v E Cj, M u /^ = a -1 too, and M vu / = a. 

Putting together all the partial results obtained, we get the assertion stated in the lemma. ■ 



Corollary 1 Let M = Adj{G). For each i E {1, •••,?'}, it is satisfied that for all u E Bi,v E Ci,w E 
Di, M uv = M vu = M uw = M wu = M vw = M wv = a, where a E {0, 3}. 



Proof: From Lemma [21 for the case i = j, we get that for all u E Bi, v E Cj, w E Di, M uv = 
M uw = M vu = M vw = M wu = M wv = a and M uv = M uw = M vu = M vw = M wu = M wv = a^ 1 . 
Hence, it must hold that a = a~ l , so a E {0,3}. ■ 

Let us define two families of partitions of Ai for i,j E {1, 

A? = {xeAi:Vu£Bi,v'£ Cj, M xv , = M uv ,} 

Af = {xeAi-.y u e Bi, v' e Cj,M xv , + M uv ,} 

Note that, since the vertices of Ai are unable to distinguish among the vertices of Cj, then, if 
M xv / ^ M uv i for some u E Bi or some v' E Cj, then M xv > ^ M uv > for all u E Bi and all v' E Cj. 
Hence, each pair of sets A c ? and A- 3 defines a partition of Ai. Note also that, since each vertex in 
Ai has the same type of adjacency with all the vertices in Bi U Cj U Di (from Lemma [1]), then for 
all x E Af , u E Bi, v E Cj, w E Di, u' E Bj, v' E Cj, and w' E Dj, M xu / = M xv > = M xw > = M uv > = 
M uw i = M vu i = M vw i = M wu i = M wv / (from Lemma [2]). 

Lemma 3 For all i E {l,...,r}, let A\ = 0^=1^, and let Af = \Sj=i^H' > ■ Then, any isomor- 
phism of Ge and Ge> that maps GEi to Ge>, maps the vertices in Af among themselves. 
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Figure 5: Partition of A4 into subsets A?, and Af. 



Proof: From Observation [TJ partition S k is equitable. Hence, for each i,j G {1, ...,r}, for all 
u,v G Sf, ADeg(u,Stf,G) = ADeg(v, Sf ,G). Thus, for all x G vlf , y G Af', « G B i9 u G Q, 
u> G A, AZ%(x, 5}, G) = ADeg(y, Sjf, G) = ADeg(u, £* G) = iDe^w, 5* G) = ADeg{w, £* G). 

Let us take any pair of values of i and j. From Lemma [21 all the vertices of Bi have the same 
type of adjacency with all the vertices of Sj = Cj U Dj. Assume this type of adjacency is a. From 

the definition of A° 3 , all the vertices of A° 3 have adjacency a with all the vertices of Sj. Hence, 
for x G A?, u G B h ADeg{x, S),G) = ADeg{u,S\,G). Since ADeg(x, S$, G) = ADeg(u, ,G) 
and ADeg(x,S],G) = ADeg(u, Sj,G), then ADeg(x, Ej,G) = ADeg(u, Ej,G) (note that £y = 

However, from the definition of A™ J , for y G A"- 7 , ADeg(y, S*j,G) 7^ ADeg(x, <Sj,G). Hence, 
since ADeg(y, Sj, G) = ADeg(x, Sj ,G), ADeg{y,Ej,G) / ADeg(x, Ej,G). 

Since any isomorphism must match vertices with the same degree, every isomorphism of G# 
and G^y that maps G#- to G E /, maps the vertices in A- J among themselves. 

Applying this argument over all possible values of j, we get that any isomorphism of Ge and 
Ge' that maps G^ to G E ', maps the vertices in Af among themselves, for all i G {1, ...,r}. ■ 

Let us focus on any isomorphism of Ge and G^y that maps G^ to G^y for all % G {1, ...,r} 
(there is at least one from Observation [3]) . 

Lemma 4 Gb is isomorphic to Gq, and there is an isomorphism of them that matches the vertices 
in Bi to those in Ci, for all i G {1, 

Proof: 

Let us analyze the adjacencies between the vertices in A\, Bi, Ci, Aj, Bj, and Cj for some 
values of i and j. From Corollary [Q for all u G Bi, v G Cj, M uv = M vu = a, where a G {0,3}. 
From the definition of A?, for all x G A\, M xu = M xv = M ux = M vx = M uv = a. 

From Lemma [3l the vertices of A™ are mapped among themselves in any isomorphism of Ge 
and Ge> that maps G^ to Ge 1 - Hence, the vertices of A\ U Bi must be mapped to the vertices of 
A^U Ci. If a = 0, then A\, Bi, and Ci are disconnected. Hence, G^ and Gc t must be isomorphic. 
In the case a = 3, taking the inverses of the graphs leads to the same result. 

From Lemma [21 for each i,j G {1, there is some a G {0, ...,3} such that for all u G Bi, 

v G Ci, u' G Bj, v' G Cj, M uv i = M vu i = a and M u / V = M v / U = a -1 . From the definition of A\, for 
all x G A\, for all u G Bi, v G Ci, v! G Bj, v' G Cj, M xu i = M xv < = M uv r. 

Putting all this together, we come to a picture of the adjacencies among A\, Bi, Ci, Aj, Bj, 
and Cj as shown in Figure EJ The connections between the vertices of A? and the vertices of Bi, 
and between the vertices of A^ and the vertices of Cj are all-to-all (all the same) of value or 3. 
Similarly, the adjacencies between the vertices of A c - and the vertices of Bj, and the adjacencies 
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Figure 6: Adjacencies between Ei and Ej, and between E[ and E'-. 



between the vertices of A c - and the vertices of Cj are all the same, all-to-all or 3 (not necessarily 
equal to those of A\ and Bi or d). The adjacencies between A\ and Bj U Cj are all the same, 
all-to-all of any value in the set {0, ...,3}. This also applies to the adjacencies between A c - and 
Bi U d. 

If GBiUBj is not isomorphic to Gc^Ci the discrepancy must be in the adjacencies between 
vertices of Bi and Bj with respect to the adjacencies between vertices of Cj and Cj. In such a 
case, in the isomorphism between Ge^e, and Ge'.ue'. (recall that from Observation [3] there is an 

J i j 

isomorphism of Ge and Ge> that maps the vertices of Ei to the vertices in E[ for all i E {1, ■■■,r}) 
some vertices of A\ should be mapped to vertices of C, and some of the vertices of Bi should be 
mapped to vertices of A\. However, due to the adjacencies among A?, Bi, Ci, Aj, Bj, and Cj, 
shown in Figure El that would imply that the adjacencies between the vertices of Bi and Bj had 
to match adjacencies between the vertices of A % c and A 3 C . But, in that case, the same adjacency 
pattern must exist between the vertices of Cj and Cj, to match the corresponding subgraph of 
GEiUEj- Hence, the adjacencies between Bi and Bj could have been matched to the adjacencies 
between Ci and Cj. 

Since this applies for all values of i and j, we conclude that Gs is isomorphic to Gc, and there 
is an isomorphism of them that matches the vertices in Bi to those in Cj, for all i E {l,...,r}, 
completing the proof. ■ 



Lemma 5 Gyi and G w i are isomorphic, and there is an isomorphism of them that maps the 
vertices in S\ to the vertices ofT- for all i E {1, ...,r}. 

Proof: From Lemma [21 we know that for each i,j E {l,...,r}, there is some a E {0, ...,3} such 
that for all u E Bi, v E Cj, w E Di, u' E Bj, v' E Cj, and w' E Dj, M uv i = M uw > = M vu / = M vw i = 
M wu i = M wv i = a and M u > v = M u > w = M v > u = M v i w = M w > u = M w , v = aT 1 . 

Note also that, from Corollary [Tj for all u E Bi, v E Cj, w E Di, M uv = M vw = M wu = a, where 
a E {0,3}. This adjacency pattern is graphically shown in Figured 
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From Lemma HI we know that Gb is isomorphic to Gq, and there is an isomorphism of them 
that matches the vertices in Bi to those in d, for all i G {1, ...,r}. 

From the fact that Go is isomorphic to itself, and the previous considerations on the adjacency 
pattern between the vertices in Bi, Ci, Di, Bj, Cj, and Dj for all i,j 6 {l,...,r}, shown in Fig- 
ure it is easy to see that the isomorphism of Gb and Gc obtained from Lemma HI toghether with 
the trivial automorphism of Gr> yields an isomorphism of G v i and G w i , what completes the proof. ■ 

We have shown that if two alternative sequences of partitions S k+1 , S l and T k+ \ T l lead 
to compatible partitions S l and T , where all their cells are subcells of different cells of a previous 
common level k, then the remaining subgraphs are isomorphic, and the vertices in each cell of one 
partition may be mapped to the vertices in its corresponding cell in the other partition by one such 
isomorphism. Thus, if during the search for a sequence of partitions compatible with the target, 
we have got an incompatibility at some point beyond level I, and we have to backtrack from one 
level I to another level k in which all the cells are different supersets of the cells in the current 
backtracking point, when trying a compatible path, we will get to the same dead-end. Hence, it is 
of no use to try another path from one such level k, and it will be necessary to backtrack to some 
point where at least two cells in the current backtracking point are subsets of the same cell in the 
previous backtracking point. This proves Theorem [2 
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